A Recursive Statistical Translation Model
نویسندگان
چکیده
A new model for statistical translation is presented. A novel feature of this model is that the alignments it produces are hierarchically arranged. The generative process begins by splitting the input sentence in two parts. Each of the parts is translated by a recursive application of the model and the resulting translation are then concatenated. If the sentence is small enough, a simpler model (in our case IBM’s model 1) is applied. The training of the model is explained. Finally, the model is evaluated using the corpora from a large vocabulary shared task.
منابع مشابه
A Recursive Recurrent Neural Network for Statistical Machine Translation
In this paper, we propose a novel recursive recurrent neural network (R2NN) to model the end-to-end decoding process for statistical machine translation. R2NN is a combination of recursive neural network and recurrent neural network, and in turn integrates their respective capabilities: (1) new information can be used to generate the next hidden state, like recurrent neural networks, so that la...
متن کاملBilingually-Constrained Recursive Neural Networks with Syntactic Constraints for Hierarchical Translation Model
Hierarchical phrase-based translation models have advanced statistical machine translation (SMT). Because such models can improve leveraging of syntactic information, two types of methods (leveraging source parsing and leveraging shallow parsing) are applied to introduce syntactic constraints into translation models. In this paper, we propose a bilingually-constrained recursive neural network (...
متن کاملBilingual Correspondence Recursive Autoencoder for Statistical Machine Translation
Learning semantic representations and tree structures of bilingual phrases is beneficial for statistical machine translation. In this paper, we propose a new neural network model called Bilingual Correspondence Recursive Autoencoder (BCorrRAE) to model bilingual phrases in translation. We incorporate word alignments into BCorrRAE to allow it freely access bilingual constraints at different leve...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملUse of maximum entropy in natural word generation for statistical concept-based speech-to-speech translation
Our statistical concept-based spoken language translation method consists of three cascaded components: natural language understanding, natural concept generation and natural word generation. In the previous approaches, statistical models are used only in the first two components. In this paper, a novel maximum-entropy-based statistical natural word generation algorithm is proposed that takes i...
متن کامل